Search CORE

3 research outputs found

A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese

Author: Aluisio Sandra M.
Candido Jr. Arnaldo
Duran Magali S.
Hartmann Nathan S.
Paetzold Gustavo H.
Santos Leandro B. dos
Publication venue
Publication date: 19/05/2017
Field of study

Psycholinguistic properties of words have been used in various approaches to Natural Language Processing tasks, such as text simplification and readability assessment. Most of these properties are subjective, involving costly and time-consuming surveys to be gathered. Recent approaches use the limited datasets of psycholinguistic properties to extend them automatically to large lexicons. However, some of the resources used by such approaches are not available to most languages. This study presents a method to infer psycholinguistic properties for Brazilian Portuguese (BP) using regressors built with a light set of features usually available for less resourced languages: word length, frequency lists, lexical databases composed of school dictionaries and word embedding models. The correlations between the properties inferred are close to those obtained by related works. The resulting resource contains 26,874 words in BP annotated with concreteness, age of acquisition, imageability and subjective frequency.Comment: Paper accepted for TSD201

arXiv.org e-Print Archive

Crossref

Schistosomiasis mansoni in an area of low transmission: I. impact of control measures

Author: Arnaldo Etzel
BARBOSA F.S.
BLISS C. I.
COUTINHO B.
DIAS L.C.S.
DIAS L.C.S.
DOUMENGE J.P.
FAROOQ M.
FAROOQ M.
FRÓES E.
HIATT R.A.
JORDAN P.
JORDAN P.
KATZ N.
KLOOS H.
LEHMAN Jr. J.S.
LEHMANN E.L.
Luiz Candido de Souza Dias
Luiz Koodi Hotta
Oswaldo Marçal Júnior
POLDERMAN A.M.
Rosa Maria de Jesus Patucci
SMITH D.H.
SNEDECOR G.W.
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Crossref

Rhetorical move detection in English abstracts:multi-label sentence classifiers and their annotated corpora

Author: Aluísio Sandra
Candido Jr Arnaldo
Copestake Ann
Dayrell Carmen
Feltrim Valéria
Lima Gabriel
Machado Jr Danilo
Tagnin Stella
Publication venue
Publication date: 01/01/2012
Field of study

The relevance of automatically identifying rhetorical moves in scientific texts has been widely acknowledged in the literature. This study focuses on abstracts of standard research papers written in English and aims to tackle a fundamental limitation of current machine-learning classifiers: they are mono-labeled, that is, a sentence can only be assigned one single label. However, such approach does not adequately reflect actual language use since a move can be realized by a clause, a sentence, or even several sentences. Here, we present MAZEA (Multi-label Argumentative Zoning for English Abstracts), a multi-label classifier which automatically identifies rhetorical moves in abstracts but allows for a given sentence to be assigned as many labels as appropriate. We have resorted to various other NLP tools and used two large training corpora: (i) one corpus consists of 645 abstracts from physical sciences and engineering (PE) and (ii) the other corpus is made up of 690 from life and health sciences (LH). This paper presents our preliminary results and also discusses the various challenges involved in multi-label tagging and works towards satisfactory solutions. In addition, we also make our two training corpora publicly available so that they may serve as benchmark for this new task

Lancaster E-Prints